Introduction¶

    I want to evaluate economic data related to poverty and inequality. With this data, I want to see how economic indicators change across time (year) for select countries in Asia. The four countries I chose -- China, Japan, Mongolia, and the Philippines -- were selected based on the availabilty of data. I also want to know whether this economic data correlates to measures of freedom. The economic data will be coming from the World Bank's Poverty and Inequality dataset because it contains a number of well-documented poverty and inequality indicators. On the other hand, the freedom data will be coming from the Freedom House, which measures the levels of freedom of different countries, because it tracks global trends in political rights and civil liberties. Freedom House doesn't quite have the amount of data that would be ideal, but I'm really interested in the relationships between the economy and freedom so I picked it anyways.  
    The economic data will be coming from the World Bank's Poverty and Inequality Platform API. I will be scraping the API to get the data and storing it in the form of a dataframe. The Freedom House will be coming from the CSV the Freedom House uses to store data. Similarly to the economic data, the freedom data will also be stored as a dataframe. Then, I will clean both the world bank and freedom house data sets, and then combinine them into one dataframe. With this dataframe, I will be summarizing descriptive statistics and looking at general trends, focusing on mean values for each of the countries for GDP Per Capita, the Percentage of Impoverished People, the Gini Index, and Total Freedom Score. 
    Then, I will move into visualization and analysis. The variables I will be examining are Year, the GDP Per Capita, the Percentage of Impoverished People, the Economic Freedom Scores, the Political Freedom Scores, the Legal Freedom Scores, and the Total Freedom Scores. I will be looking at how these variables change across time, and also how they correlate to each other through scatterplots and line graphs. 
    Most of the correlations between world bank variables depicted through the visualizations are intuitive. For example, GDP per capita increases over the years, and as GDP per capital increases, the percentage of impoverished people decreases. However, some of the correlations involving the freedom variables from the Freedom House are much less intuitive. For example, freedom scores do not seem to be increasing over the years. There also doesn't seem to be too strong of a correlation between economic indicators and freedom values.
In [1]:
# packages
import requests

# data manipulation
import pandas as pd 
import numpy as np

# data visualization
import plotly.express as px

Data¶

World Bank API¶

In [2]:
# webscrape world bank api
url = 'https://api.worldbank.org/pip/v1/pip?country=all&year=all&fill_-gaps=false&additional_ind=false&ppp_version=2017'
In [3]:
# set parameters
parameters = {'country': 'all',
             'year': 'all',
              'Region': 'all'}
In [4]:
r = requests.get(url, params = parameters)
r.url
Out[4]:
'https://api.worldbank.org/pip/v1/pip?country=all&year=all&fill_-gaps=false&additional_ind=false&ppp_version=2017&country=all&year=all&Region=all'
In [5]:
# check to ensure link working
r.status_code
Out[5]:
200
In [6]:
json_results = r.json()
wb = pd.DataFrame(json_results)
wb.columns
Out[6]:
Index(['region_name', 'region_code', 'country_name', 'country_code',
       'reporting_year', 'reporting_level', 'survey_acronym',
       'survey_coverage', 'survey_year', 'welfare_type',
       'survey_comparability', 'comparable_spell', 'poverty_line', 'headcount',
       'poverty_gap', 'poverty_severity', 'watts', 'mean', 'median', 'mld',
       'gini', 'polarization', 'decile1', 'decile2', 'decile3', 'decile4',
       'decile5', 'decile6', 'decile7', 'decile8', 'decile9', 'decile10',
       'cpi', 'ppp', 'reporting_pop', 'reporting_gdp', 'reporting_pce',
       'is_interpolated', 'distribution_type', 'estimation_type', 'spl', 'spr',
       'pg', 'estimate_type'],
      dtype='object')
In [7]:
# select desired variables/columns
wb = wb[['reporting_year', 'reporting_level', 'country_name', 'reporting_pop', 'reporting_gdp', 'headcount', 'poverty_severity', 'gini']]
In [8]:
# rename columns
wb = wb.rename(columns = {'headcount': 'proportion_impoverished', 'reporting_year': 'year', 'country_name': 'country', 'gini': 'gini_index', 'reporting_pop': 'population', 'reporting_gdp': 'gdp_per_capita'})

# select desired countries in Asia
wb_asia = wb[(wb['country'] == 'China') | (wb['country'] == 'Japan') | (wb['country'] == 'Mongolia') | (wb['country'] == 'North Korea') | (wb['country'] == 'Philippines') | (wb['country'] == 'South Korea') | (wb['country'] == 'Taiwan')]
In [9]:
# wb surveys national, urban, rural, but we just want the national level
wb_asia = wb_asia[wb_asia['reporting_level'] == 'national']
In [10]:
wb_asia.head()
Out[10]:
year reporting_level country population gdp_per_capita proportion_impoverished poverty_severity gini_index
388 1981 national China 9.938850e+08 447.1198 0.9178 0.2804 0.2816
391 1984 national China 1.036825e+09 596.2011 0.8119 0.1718 0.2710
394 1987 national China 1.084035e+09 786.8649 0.6739 0.1260 0.2939
397 1990 national China 1.135185e+09 905.0325 0.7196 0.1397 0.3223
400 1993 national China 1.178440e+09 1239.1294 0.6557 0.1177 0.3388

World Bank Documentation

  • headcount = proportion of impoverished people in population
  • poverty_severity = poverty_gap^2
  • mean = average daily household per capita income or consumption expenditure
  • median = median of daily household per capita income or consumption expenditure
  • gini_index = extent to which the distribution of income among individuals or households within an economy deviates from a perfectly equal distribution
In [11]:
# to csv if desired
# wb_asia.to_csv('wb_asia.csv', index = False)

Freedom House¶

In [12]:
# read in freedom house csv
freedom_house = pd.read_csv('freedom_house.csv')
In [13]:
# remove empty columns and rows (csv has blank spaces)
freedom_house = freedom_house.iloc[:,0:7].iloc[0:105]

# remove .0 from end of year so merge with wb easier
fh_year = np.array(freedom_house['Year']).astype(int)
freedom_house['Year'] = fh_year
freedom_house.head()

# rename columns so merge easier and more intuitive
freedom_house = freedom_house.rename(columns = {'Country': 'country', 'Year': 'year', 'A-Legal': 'legal_freedom', 'B-Political': 'political_freedom', 'C-Economic': 'economic_freedom', 'Total Score': 'total_score', 'Status': 'status'})

Freedom House Documentation: Higher scores indicated less freedom

  • Legal Environment
  • Political Environment
  • Economic Environment

For Total Score

  • Free = 0 - 30
  • Partly Free = 31 - 60
  • Not Free = 61- 100

Freedom House (fh) and World Bank (wb) Data¶

In [14]:
# merge wb and fh based on country and year
fh_wb_merge = pd.merge(wb_asia, freedom_house, on = ['country', 'year'], how = 'inner')

# drop duplicate and unnecessary columns
fh_wb_merge = fh_wb_merge.drop_duplicates()

# reorder columns to be more intuitive
fh_wb_merge.columns
fh_wb_merge = fh_wb_merge[['country', 'year', 'population', 'gdp_per_capita', 'proportion_impoverished', 'poverty_severity', 'gini_index', 'legal_freedom', 'political_freedom', 'economic_freedom', 'total_score', 'status']]
fh_wb_merge.head()
Out[14]:
country year population gdp_per_capita proportion_impoverished poverty_severity gini_index legal_freedom political_freedom economic_freedom total_score status
0 China 2002 1.280400e+09 2557.8916 0.3689 0.0543 0.4199 26.0 34.0 20.0 80.0 NF
1 China 2005 1.303720e+09 3390.7162 0.2225 0.0229 0.4090 27.0 34.0 22.0 83.0 NF
2 China 2008 1.324655e+09 4711.6434 0.1727 0.0184 0.4297 28.0 35.0 22.0 85.0 NF
3 China 2010 1.337705e+09 5647.0687 0.1338 0.0116 0.4374 29.0 34.0 22.0 85.0 NF
4 China 2011 1.345035e+09 6152.6969 0.0995 0.0074 0.4241 29.0 34.0 22.0 85.0 NF
In [15]:
# take the natural log of gdp to see growth rate
log_gdp = np.array(fh_wb_merge['gdp_per_capita'])
log_gdp = [np.log(x) for x in log_gdp]
fh_wb_merge['log_gdp'] = log_gdp
fh_wb_merge.head()

# convert proportion_impovershed to percentage
percentage_impoverished = np.array(fh_wb_merge['proportion_impoverished'])
percentage_impoverished = percentage_impoverished * 100 # to percentage
fh_wb_merge['percentage_impoverished'] = percentage_impoverished # create column
fh_wb_merge = fh_wb_merge.drop(columns = 'proportion_impoverished') # drop original column

# convert poverty_severity to percentage
percentage_poverty_severity = np.array(fh_wb_merge['poverty_severity'])
percentage_poverty_severity = percentage_poverty_severity * 100 # to percentage
fh_wb_merge['percentage_poverty_severity'] = percentage_poverty_severity # create column
fh_wb_merge = fh_wb_merge.drop(columns = 'poverty_severity') # drop original column

# move log_gdp position to be near gdp_per_capita
col = fh_wb_merge.pop('log_gdp')
fh_wb_merge.insert(4, 'log_gdp', col)
fh_wb_merge.head()
Out[15]:
country year population gdp_per_capita log_gdp gini_index legal_freedom political_freedom economic_freedom total_score status percentage_impoverished percentage_poverty_severity
0 China 2002 1.280400e+09 2557.8916 7.846939 0.4199 26.0 34.0 20.0 80.0 NF 36.89 5.43
1 China 2005 1.303720e+09 3390.7162 8.128796 0.4090 27.0 34.0 22.0 83.0 NF 22.25 2.29
2 China 2008 1.324655e+09 4711.6434 8.457792 0.4297 28.0 35.0 22.0 85.0 NF 17.27 1.84
3 China 2010 1.337705e+09 5647.0687 8.638892 0.4374 29.0 34.0 22.0 85.0 NF 13.38 1.16
4 China 2011 1.345035e+09 6152.6969 8.724646 0.4241 29.0 34.0 22.0 85.0 NF 9.95 0.74

Descriptive Statistics¶

Calculating Means

In [16]:
# group by country
means = fh_wb_merge.drop(columns = ['year']).groupby('country').agg({'gdp_per_capita': 'mean',
                                                                     'percentage_impoverished': 'mean',
                                                                     'gini_index': 'mean', 
                                                                     'total_score': 'mean'})
In [17]:
# based on gdp
means.sort_values(by = 'gdp_per_capita')
Out[17]:
gdp_per_capita percentage_impoverished gini_index total_score
country
Philippines 2387.541680 11.810000 0.439810 43.000000
Mongolia 2990.149543 2.402857 0.334014 36.571429
China 6017.386200 11.312000 0.410250 84.500000
Japan 33579.911233 0.403333 0.332633 22.333333
There are no clear correlations between gdp and the percentage of impoverished people, gini index, or total score. 
In [18]:
# based on percentage impoverished (and gini basically)
means.sort_values(by = 'percentage_impoverished')
Out[18]:
gdp_per_capita percentage_impoverished gini_index total_score
country
Japan 33579.911233 0.403333 0.332633 22.333333
Mongolia 2990.149543 2.402857 0.334014 36.571429
China 6017.386200 11.312000 0.410250 84.500000
Philippines 2387.541680 11.810000 0.439810 43.000000
Very intuitively, as the percent of impoverished people increases, so does the gini index. In other words, the greater the percentage of impoverished people, the greater the amount of inequality. Countries with a percent of impoverished people and gini index (China is a slight outlier as expected) also experience a higher total score, so less freedom. 

Analysis¶

Year and Log of GDP Per Capita

In [19]:
fig = px.line(
    fh_wb_merge, 
    x = 'year', 
    y = 'log_gdp', # from wb
    labels = {'year': 'Year', 'log_gdp': 'GDP Per Capita Growth % Rate'},
    hover_name = 'country',  
    title = '% Growth Rate of GDP Per Capita over the Years for Countries in Asia',
    color = 'country')

# Show the plot
fig.show()
There seems to be a positive correlation between year and growth rate of gdp for these countries in Asia. As the year increases, GDP has a posititive growth percentage rate for the most part.

Year and % Impoverished

In [20]:
fig = px.line(
    fh_wb_merge, 
    x = 'year', 
    y = 'percentage_impoverished', # from wb, technically headcount
    labels = {'year': 'Year', 'percentage_impoverished': '% Impoverished in Population'},
    hover_name = 'country',  
    title = '% Impoverished Over the Years for Countries in Asia',
    color = 'country')

# Show the plot
fig.show()
There seems to be a negative correlation between year and the percent of the population in poverty, aka headcount, for these countries in Asia. As the year increases, the percent of people in poverty generally decreases.

Year and Economic Freedom Score

In [21]:
fig = px.line(
    fh_wb_merge, 
    x = 'year', 
    y = 'economic_freedom', # from fh
    labels = {'year': 'Year', 'economic_freedom': 'Economic Freedom Scores'},
    title = "Economic Freedom Scores over the Years for Countries in Asia",
    hover_name = 'country',
    color = 'country')
    
fig.update_traces(
    hovertemplate=(
        'Country: %{hovertext}<br>'  # Hover text
        'Economic Freedom: %{y}<br>'  # Economic freedom value
        'Year: %{x}<br>'  # Year
        'Disclaimer: As the economic freedom score increases, economic freedom decreases.<br>'  # Disclaimer
))

# Show the plot
fig.show()
There doesn't seem to be a strong correlation between year and economic freedom. Instead, the amount of economic freedom seems to be flatlining across years. An analysis of a greater amount of years or countries would prove useful. 

Year and Political Freedom Score

In [22]:
fig = px.line(
    fh_wb_merge, 
    x = 'year', 
    y = 'political_freedom', # from fh
    labels = {'year': 'Year', 'political_freedom': 'Political Freedom Scores'},
    title = "Political Freedom Scores over the Years for Countries in Asia",
    hover_name = 'country',
    color = 'country')

fig.update_traces(
    hovertemplate=(
        'Country: %{hovertext}<br>'  # Hover text
        'Political Freedom: %{y}<br>'  # Economic freedom value
        'Year: %{x}<br>'  # Year
        'Disclaimer: As the political freedom score increases, political freedom decreases.<br>'  # Disclaimer
))

# Show the plot
fig.show()
There doesn't seem to be a strong correlation between year and political freedom. Instead, the amount of political freedom seems to be flatlining across years. 

Year and Legal Freedom Score

In [23]:
fig = px.line(
    fh_wb_merge, 
    x = 'year', 
    y = 'legal_freedom', # from fh
    labels = {'year': 'Year', 'legal_freedom': 'Legal Freedom Scores'},
    title = "Legal Freedom Scores over the Years for Countries in Asia",
    hover_name = 'country',
    color = 'country')

fig.update_traces(
    hovertemplate=(
        'Country: %{hovertext}<br>'  # Hover text
        'Legal Freedom: %{y}<br>'  # Economic freedom value
        'Year: %{x}<br>'  # Year
        'Disclaimer: As the legal freedom score increases, legal freedom decreases.<br>'  # Disclaimer
))

# Show the plot
fig.show()
There seems to be a positive correlation between year and legal freedom. As the year increases, the amount of legal freedom decreases (higher freedom scores indicate less freedom).

GDP Per Capita and Percent of Population in Poverty

In [24]:
fig = px.scatter(
    fh_wb_merge, 
    x = 'gdp_per_capita', # from wb, taking natural log will show more variation in visualizations
    y = 'percentage_impoverished', # from fh
    labels = {'gdp_per_capita': 'GDP Per Capita', 'percentage_impoverished': '% Impoverished in Population'},
    hover_name = 'country',  
    title = 'GDP Per Capita Compared to % Impoverished People', # make sure says per capita (do per capita), take natural log of gdp (compresses numbers with large skew)
    color = 'country')

# Show the plot
fig.show()
There seems to be a negative correlation between gross domestic product (gdp) per capita and percent of the population in poverty. As gdp per capita increases, the percent of people in poverty decreases. This is pretty intuitive as GDP per capita (output per person) is considered a broad measure of economic growth. It would make sense that the percent of impoverished people decreases when the ouput per person increases and the economy is growing.

Percentage Poverty Severity and Total Score

In [25]:
# wb and fh data
fig = px.scatter(
    fh_wb_merge, 
    x = 'percentage_poverty_severity', # from wb
    y = 'total_score', # from fh
    labels = {'percentage_poverty_severity': 'Mean % Shortfall from the Poverty Line', 'total_score': 'Total freedom Scores'},
    hover_name = 'country',  
    title = 'Poverty Severity Compared to Total Freedom Score',
    color = 'country')

fig.update_traces(
    hovertemplate=(
        'Country: %{hovertext}<br>'  # Hover text
        'Total Freedom: %{y}<br>'  # Economic freedom value
        'Year: %{x}<br>'  # Year
        'Disclaimer: As the total freedom score increases, total freedom decreases.<br>'  # Disclaimer
))

# Show the plot
fig.show()
There seems to be a negative correlation between poverty severity (the mean percentage shortfall from the poverty line) and the total freedom score. Looking at Freedom House's documentation, higher values of Total freedom (total_score) indicate less freedom. As such, as the poverty severity increases, the amount of freedom increases, which seems counterintuitive. Given that total freedom encompasses economic freedom as well, we would expect to see the total freedom score increasing (becoming less free) when the percengantage shortfall from the poverty line increases.

Conclusion¶

    Using data from the World Bank and the Freedom House, I wanted to evaluate economic data related to poverty and inequality and see how the variables correlated with each other and freedom scores. I selected four countries in Asia -- China, Japan, Mongolia, and the Philippines -- due to the availabilty of data. 
    Most of the correlations between world bank variables depicted through the visualizations were intuitive. For example, GDP per capita increases over the years, and as GDP per capital increases, the percentage of impoverished people decreases. However, some of the correlations involving the freedom variables from the Freedom House were much less intuitive. For example, freedom scores do not seem to be increasing over the years, and total freedom score and inequality indicators seem to have inverse relationships. It could prove interesting to take closer look at these relationships would be to find the correlation between different types of governments (autocratic, democratic, etc) and economic prosperity. 
    This analysis has many limitations. Freedom House lacked proper documentation for a large number of years and countries, which made it difficult to match the well-documentated countries and years of the World Bank. As a result, only a small number of countries over a short amount of time were examined. An analysis of more countries over a greater span of time could yield very different results. For example, for a possible next step, it may prove useful to analyze Singapore, an outlier that doesn't follow the conventional idea that democracy is necessary to economically prosper. An alternative data source that could have been selected instead is Varities of Democracy (V-Dem), which contains democracy ratings. V-Dem has better documentation for years and countries, and would have paired well with the economic data from the World Bank (however, I was primarily interested in measures of freedom (beyond just political and democracy scores) which is why I didn't choose it).